Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix --preserve-fds, eliminate stray FD being passed into container #2893

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

aidanhs
Copy link

@aidanhs aidanhs commented Aug 24, 2024

#2663 seems to have broken preserve-fds by unconditionally closing everything aside from stdio in the container main process.

I've fixed this by effectively reverting part of that PR. Typically I'd expect it to be a bad idea to revert security fixes, but unfortunately I don't really understand much of the diagnosis at #2663 and can't get it to stack up against my observations of previous and current behavior of youki.

Specifically:

youki does leak a directory file descriptor from the host mount namespace, but it just so happens that the directory is the rootfs of the container

I checked out 04f8f2d (the commit before the fix PR was merged) and couldn't reproduce this. With a default config generated by youki spec and a command of "ls", "-al", "/proc/self/fd" I get the following output:

dr-x------    2 root     root             0 Aug 24 20:17 .
dr-xr-xr-x    9 root     root             0 Aug 24 20:17 ..
lrwx------    1 root     root            64 Aug 24 20:17 0 -> /dev/pts/9
lrwx------    1 root     root            64 Aug 24 20:17 1 -> /dev/pts/9
lrwx------    1 root     root            64 Aug 24 20:17 2 -> /dev/pts/9
ls: /proc/self/fd/3: cannot read link: No such file or directory
lr-x------    1 root     root            64 Aug 24 20:17 3

This is what I'd expect - preserve fds (#177) already closes all the FDs in the init process before execution. FD 3 here is ls opening the directory.

I can reproduce a leak by passing --preserve-fds 10 (I get something at FD 7, which seems to be the rootfs being referenced) - but the fix solves this by just making the preserve-fds flag useless. I fix this by actually closing the culprit directory, and then additionally verifying by hand that passing --preserve-fds 100 does not list any other stray FDs that are being passed down.

In addition, no care is taken to make sure all non-stdio files are O_CLOEXEC

Again I don't understand this - the preserve-fds PR does exactly that by marking everything as O_CLOEXEC before process execution? It is not ideal that passing --preserve-fds leaks something, but I've fixed that in this PR in a more precise way.

In addition, any file descriptors passed to youki are not closed until the container process is executed, meaning that easily-overlooked programming errors by users of youki can lead to these attacks becoming exploitable.

I also don't understand this - the preserve fds PR sets the O_CLOEXEC flag in the init process, before execing the user process. It seems better to do this as late as possible, so fewer file descriptors can slip in? The vulnerability fix puts it in the main process, giving ample opportunity for other FDs to be opened and leaked.

Perhaps there's more detail that would help my confusion in the review in the private repository that would help? Per:

This PR has already been approved by jprendes rumpl yihuaf in private repository.

Edit: oops, sorry for the mentions, those were unintentional

@utam0k utam0k requested review from utam0k and a team August 28, 2024 13:15
@YJDoc2
Copy link
Collaborator

YJDoc2 commented Aug 28, 2024

@aidanhs , failing oci CI -

Running default/default.t
failed to create the container
ERROR libcontainer::process::container_init_process: failed to cleanup extra fds err=Nix(EPERM)
ERROR libcontainer::process::container_intermediate_process: failed to initialize container process: failed syscall
ERROR libcontainer::process::container_main_process: failed to wait for init ready: exec process failed with error error in executing process : failed syscall
ERROR libcontainer::container::builder_impl: failed to run container process exec process failed with error error in executing process : failed syscall
ERROR youki: error in executing command: exec process failed with error error in executing process : failed syscall

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
error in executing command: exec process failed with error error in executing process : failed syscall

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
Error: exec process failed with error error in executing process : failed syscall

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
exit status 1
error: Recipe `test-oci` failed on line 54 with exit code 1

Signed-off-by: Aidan Hobson Sayers <aidanhs@cantab.net>
@aidanhs
Copy link
Author

aidanhs commented Aug 30, 2024

The failure was a problem with some changes that I made to try and make closing FDs happen later - unfortunately seccomp got in the way. I've now backed out those changes as strictly speaking they were unnecessary.

Have run the tests locally and they seem to pass.

Copy link
Member

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR! Probably you are right, but I'm not sure. So, may I ask you to write an e2e test to clarify?
https://containers.github.io/youki/developer/e2e/rust_oci_test.html

Also, I've updated the original PR. Thanks for your advice. It makes sense a lot.
#2663

Signed-off-by: Aidan Hobson Sayers <aidanhs@cantab.net>
Test harness additionally needed to support

1. tests that cannot run in parallel
2. tests that need to customise create arguments

Signed-off-by: Aidan Hobson Sayers <aidanhs@cantab.net>
@aidanhs
Copy link
Author

aidanhs commented Sep 2, 2024

There are now three integration tests:

  1. that youki itself is not leaking FDs (i.e. guarding against a hypothetical future vulnerability)
  2. that preserve-fds actually works
  3. that open FDs are not passed down without preserve-fds

This was harder than I expected because:

  1. I needed to extend the test harness to permit customisation of arguments to container create (I did a small amount of tidying here where another test wanted to do this too, but I didn't want to get too distracted)
  2. because this testing is sensitive to open FDs, it cannot be run in parallel with other tests - I have had to extend the test harness to support non-parallel execution
  3. the --preserve-fds flag does not get passed via the spec so I needed to communicate state via the filesystem

Signed-off-by: Aidan Hobson Sayers <aidanhs@cantab.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants